首页> 外文OA文献 >A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data
【2h】

A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data

机译:基于表示的子空间聚类统一框架   样本外和大规模数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Under the framework of spectral clustering, the key of subspace clustering isbuilding a similarity graph which describes the neighborhood relations amongdata points. Some recent works build the graph using sparse, low-rank, and$\ell_2$-norm-based representation, and have achieved state-of-the-artperformance. However, these methods have suffered from the following twolimitations. First, the time complexities of these methods are at leastproportional to the cube of the data size, which make those methods inefficientfor solving large-scale problems. Second, they cannot cope with out-of-sampledata that are not used to construct the similarity graph. To cluster eachout-of-sample datum, the methods have to recalculate the similarity graph andthe cluster membership of the whole data set. In this paper, we propose aunified framework which makes representation-based subspace clusteringalgorithms feasible to cluster both out-of-sample and large-scale data. Underour framework, the large-scale problem is tackled by converting it asout-of-sample problem in the manner of "sampling, clustering, coding, andclassifying". Furthermore, we give an estimation for the error bounds bytreating each subspace as a point in a hyperspace. Extensive experimentalresults on various benchmark data sets show that our methods outperform severalrecently-proposed scalable methods in clustering large-scale data set.
机译:在频谱聚类的框架下,子空间聚类的关键是建立一个描述数据点之间邻域关系的相似度图。最近的一些工作使用稀疏,低秩和基于$ \ ell_2 $ -norm的表示来构建图形,并获得了最新的性能。但是,这些方法受到以下两个限制。首先,这些方法的时间复杂度至少与数据大小的立方成比例,这使得这些方法无法有效解决大规模问题。其次,它们无法处理未用于构造相似度图的样本外数据。为了聚类每个样本数据,这些方法必须重新计算相似度图和整个数据集的聚类成员。在本文中,我们提出了一个统一的框架,该框架使基于表示的子空间聚类算法可用于对样本外数据和大规模数据进行聚类。在我们的框架下,通过以“采样,聚类,编码和分类”的方式将其转换为样本外问题来解决大规模问题。此外,我们通过将每个子空间作为超空间中的一个点进行处理,对误差范围进行了估计。在各种基准数据集上的大量实验结果表明,在对大型数据集进行聚类时,我们的方法优于最近提出的几种可扩展方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号